Exploiting Thesaurus Knowledge in Rule Induction for Text Classiication

نویسندگان

  • Markus Junker
  • Andreas Abecker
چکیده

Systems for learning text classiiers recently gained considerable interest. One technique to implement such systems is rule induction. While most other approaches rely on a relatively simple document representation and do not make use of any background knowledge, rule induction algorithms ooer a good potential for improvements in both of these areas. In this paper , we show how an operator-based view of rule induction enables the easy integration of a thesaurus as background knowledge. Results with an algorithm extended by thesaurus knowledge are presented and interpreted. The interpretation shows the strengths and weaknesses of using thesaurus knowledge and gives hints for future research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering Multi-level Classiication Rules in Platelet Transfusion Databases

Knowledge discovery in Databases (KDD) is an emerging eld in Artiicial Intelligence. Fayyad, Piatetsky-Shapiro, & Smith deene KDD as the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. The techniques and algorithms used in the discovery process vary from system to system. In this thesis proposal, I present a algorithm that ind...

متن کامل

Ontologising Relational Triples into a Portuguese Thesaurus

Having in mind the automatic acquisition and integration of knowledge from different heterogeneous resources, this paper proposes several automatic methods for attaching term-based relational triples to the synsets of a thesaurus, without exploiting the extraction context for disambiguation. After using the proposed methods to attach triples, extracted from a Portuguese dictionary, to the synse...

متن کامل

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

Text Mining: Extraction of Interesting Association Rule with Frequent Itemsets Mining for Korean Language from Unstructured Data

Text mining is a specific method to extract knowledge from structured and unstructured data. This extracted knowledge from text mining process can be used for further usage and discovery. This paper presents the method for extraction information from unstructured text data and the importance of Association Rules Mining, specifically for of Korean language (text) and also, NLP (Natural Language ...

متن کامل

Incorporating Concept-Based Match into Fuzzy Production Rules

F TP (Fuzzy Template Predicate) is proposed as a template to incorporate concept-based match into fuzzy production languages. A thesaurus augmented in F TP supports the concept-based match, which is more sophisticated than previous fuzzy match mechanisms. Membership functions for fuzzy linguistic variables and fuzzy numbers are used as an interface to the thesaurus. F TP also has self reening f...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997